Re-enable time-dependent z-scoring for Flow Matching by satwiksps · Pull Request #1752 · sbi-dev/sbi

satwiksps · 2026-02-03T19:27:38Z

Description

This PR re-introduces z-scoring for Flow Matching estimators using a time-dependent normalization approach.

As discussed in #1623, standard z-scoring at $t=0$ is problematic because the network input is noise, not data. This implementation interpolates the normalization statistics based on the time step $t$, ensuring the network always receives inputs with standard statistics:

$$\mu_t = t \cdot \mu_{data}$$ $$\sigma_t = \sqrt{t^2 \sigma_{data}^2 + (1-t)^2}$$

Related Issues/PRs

Closes Add back z-scoreing for flow matching #1623

Changes

sbi/neural_nets/net_builders/vector_field_nets.py: Updated build_vector_field_estimator to calculate the training data statistics (mean and std) and pass them to the estimator.
sbi/neural_nets/estimators/flowmatching_estimator.py:
- Registered mean_1 and std_1 as buffers (initialized as floats to ensure type consistency).
- Updated forward() to apply the time-dependent z-scoring formula.
- Rescaling: The output vector field is rescaled by $\sigma_t$. This implements standard derivative scaling: since the network predicts the vector field in the normalized space, we multiply the output by $\sigma_t$ to map it back to the original data space. This ensures the network targets remain well-conditioned (unit-variance scale) throughout the integration.
tests/linearGaussian_vector_field_test.py: Added a new integration test (test_fmpe_time_dependent_z_scoring_integration) to verify that statistics are correctly learned and the forward pass executes without errors.

Verification

Integration Test: Added a specific test case that confirms mean_1 and std_1 are populated and the ode_fn runs correctly.
Benchmarks: I ran the sbi benchmarks locally (pytest --bm --bm-mode fmpe) to check for stability and performance. All 12 tests passed successfully (screenshot attached below).

codecov · 2026-02-03T19:59:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.07%. Comparing base (937efc2) to head (da03aad).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1752      +/-   ##
==========================================
- Coverage   88.54%   88.07%   -0.48%     
==========================================
  Files         137      137              
  Lines       11515    12258     +743     
==========================================
+ Hits        10196    10796     +600     
- Misses       1319     1462     +143

Flag	Coverage Δ
fast	`84.58% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...i/neural_nets/estimators/flowmatching_estimator.py	`97.05% <100.00%> (+0.39%)`	⬆️
sbi/neural_nets/net_builders/vector_field_nets.py	`93.16% <100.00%> (ø)`

... and 10 files with indirect coverage changes

satwiksps · 2026-02-03T21:05:59Z

It seems tests/torchutils_test.py::TorchUtilsTest::test_searchsorted is consistently failing in the CI with an execnet.gateway_base.DumpError.

Since this failure is in torchutils_test.py (which I haven't touched) and appears to be a serialization issue with pytest-xdist masking a local assertion error, I believe it is unrelated to my changes in flowmatching_estimator.py ?

The actual Flow Matching benchmarks and integration tests for this PR passed successfully though

this was an old bug that surfaced now likely because codecov was trying to serialize things.

janfb · 2026-02-05T15:51:34Z

It seems tests/torchutils_test.py::TorchUtilsTest::test_searchsorted is consistently failing in the CI with an execnet.gateway_base.DumpError.

Since this failure is in torchutils_test.py (which I haven't touched) and appears to be a serialization issue with pytest-xdist masking a local assertion error, I believe it is unrelated to my changes in flowmatching_estimator.py ?

The actual Flow Matching benchmarks and integration tests for this PR passed successfully though

Yes, this is unrelated and popped up here by chance or because of an unrelated change in a downstream package. I pushed a fix to this branch ✅

janfb · 2026-02-05T15:56:42Z

Thanks for working on this @satwiksps !

Overall, this looks exactly right. However, after reviewing the code and tracing through the flow matching implementation, I believe the z-scoring formula is inverted relative to the interpolation convention (quite confusing!)

The interpolation in the loss function is:

theta_t = (1 - t) * theta_data + t * theta_noise

So the expected input mean at each time is:

E[θ_t] = (1-t) * mean_data + t * 0 = (1-t) * mean_data

Current PR formula:

mu_t = t * mean_1
var_t = (t * std_1)² + (1 - t)²

This gives mu_t = 0 at t=0 and mu_t = mean_data at t=1 — exactly backwards.

Correct formula should be:

mu_t = (1 - t) * mean_1
var_t = ((1 - t) * std_1)² + t²

The formula only matches at t=0.5 and is maximally wrong at the boundaries.

Note on zuko's sampling: I had to dig a bit but in zuko, NormalizingFlow.sample() uses transform.inv() which integrates backward (t1→t0), so training and sampling conventions do align — the issue is purely the z-scoring formula.

To verify this, I suggest the following test: The standard linear Gaussian test, but with uniform prior between 95 and 100, and with data x_o centered at 100 (far from N(0,1)). With the inverted formula, C2ST should degrades significantly compared to no z-scoring and it should be fixed (c2st close 0.5) with the correct formula.

Can you confirm this (maybe I got confused with the integration directions after all)?

manuelgloeckler · 2026-02-05T16:12:08Z

sbi/neural_nets/estimators/flowmatching_estimator.py


-        # call the network to get the estimated vector field
-        v = self.net(input, condition_emb, time)
+        t_view = time.view(-1, *([1] * (input.ndim - 1)))


Assuming a Gaussian target at t=1 with the here given mu1 and std1 the exact marginal velocity would have follow form:

# ---- marginal Gaussian stats (alpha=t, sigma=1-t, diag C = s1^2) ---- mu_t = t_view * m # \bar{mu}_t var_t = (t_view.square() * s1_sq) + one_minus_t.square() # diag(S_t) std_t = var_t.sqrt().clamp_min(self.eps) # ---- z-scoreing-scaling for net (as currently) ---- x_centered = x - mu_t x_norm = x_centered / std_t # c_in * (x - mu_t) resid_norm = self.net(x_norm, condition_emb, t) # f_theta(...) resid = resid_norm * std_t # c_out * f_theta # ---- Gaussian posterior mean E[x1 | xt=x] under diag prior ---- # k_t = alpha * C / S_t with alpha=t and C=s1^2 (diagonal) k_t = (t_view * s1_sq) / var_t x1_hat = m + k_t * x_centered # m + k_t (x - t m) # ---- Gaussian affine baseline: a(t)=t, b(t)=1-t ---- u_gauss = (t_view * x) + (one_minus_t * x1_hat)

Although this is only with respect to the "prior" (i.e. not the posterior). But might still be reasonable.

manuelgloeckler

Hey @satwiksps !

Thanks for the contribution! I checked with main and as of now it does I guess on average perform very similar if not a bit worse than before (although, I think thats mostly fine i.e. these tasks).

I wonder if it would make sense to improve the "preconditioning" a bit more (see comments).

janfb · 2026-02-06T05:41:23Z

Hey @satwiksps !

Thanks for the contribution! I checked with main and as of now it does I guess on average perform very similar if not a bit worse than before (although, I think thats mostly fine i.e. these tasks).
I wonder if it would make sense to improve the "preconditioning" a bit more (see comments).

Thanks for adding the comparison to main. What could happen here is that the benchmarking tasks are not discriminative w.r.t. to z-scoring, no? I.e., we need a task that benefits from z-scoring.

janfb · 2026-02-06T07:17:44Z

Alright, I looked at it again and I realized that my proposal was actually incorrect. The formulas I proposed would result in total normalization, i.e., "independent" z-scoring, where all time steps have equal zero mean after z-scoring and we lose valuable time-depenedent information - sorry @satwiksps , your formulas where actually correct!

What Manuel proposed is great, we z-score with respect to the Gaussian baseline, e.g., what one would expect when the posterior is actually Gaussian. Then the flow matching network only has to learn the residual from this ideal baseline (please correct me @manuelgloeckler if this intuition is inaccurate).

I tested this locally with the following setup:

Prior: BoxUniform([95, 105]), x_o=100
Simulator: x = theta + 0.5 * noise
Reference posterior: N(x_o, 0.5²I)
3000 simulations.

Results:

Formula	C2ST	Description
Gaussian	0.631	Gaussian baseline + residual learning
var_only	0.772	Variance scaling only
pr	0.774	PR's time-dependent z-scoring
static	0.796	Static mean subtraction
none	0.865	No z-scoring
independent z-scoring	0.922	"Correct" mean formula

Thus, @satwiksps I suggest you implement both options, your proposal and Manuel's proposal and add the test as a new z-scoring test and confirm the results.
@manuelgloeckler I think it would be good to have both options as the gaussian baseline assumption can be suboptimal when the posterior is multi-modal or skewed?

manuelgloeckler · 2026-02-06T09:20:21Z

@janfb The preconditioning is with respect to the "prior" not the posterior (as this would require regression from x). I don't think that it will "hurt" in almost all cases i.e. FM nets are initialized to output zero hence effectively will let the initialized network sample from a mass covering Gaussian approximation of the prior (and everything else needs to be learned).

Nonetheless having an option to disable it is always good.

Agree that the benchmark tests are not really sensitive to the z-scoreing, but as we usually enable z-scoreing by default it shouldn't hurt performance even if its not necessary. But as said the deviation is small enough to be fine (and might improve with the additional baseline).

satwiksps added 4 commits February 3, 2026 13:35

update vector field builder to pass z-scoring stats to estimator

2fbcf18

implement time-dependent z-scoring logic in FlowMatchingEstimator

3883de2

add integration test for FMPE time-dependent z-scoring

dc8dc86

trigger ci

994e08d

satwiksps marked this pull request as ready for review February 3, 2026 19:51

re-trigger ci after ready for review

b1d7a3b

again trigger the ci to pass the flaky test

6498c3f

fix serialization issue in test cases.

da03aad

this was an old bug that surfaced now likely because codecov was trying to serialize things.

manuelgloeckler reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable time-dependent z-scoring for Flow Matching#1752

Re-enable time-dependent z-scoring for Flow Matching#1752
satwiksps wants to merge 7 commits intosbi-dev:mainfrom
satwiksps:fm-z-scoring

satwiksps commented Feb 3, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

satwiksps commented Feb 3, 2026

Uh oh!

janfb commented Feb 5, 2026

Uh oh!

janfb commented Feb 5, 2026

Uh oh!

manuelgloeckler Feb 5, 2026 •

edited

Loading

Uh oh!

manuelgloeckler left a comment

Uh oh!

janfb commented Feb 6, 2026

Uh oh!

janfb commented Feb 6, 2026 •

edited

Loading

Uh oh!

manuelgloeckler commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

satwiksps commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues/PRs

Changes

Verification

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

satwiksps commented Feb 3, 2026

Uh oh!

janfb commented Feb 5, 2026

Uh oh!

janfb commented Feb 5, 2026

Uh oh!

manuelgloeckler Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manuelgloeckler left a comment

Choose a reason for hiding this comment

Uh oh!

janfb commented Feb 6, 2026

Uh oh!

janfb commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manuelgloeckler commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

satwiksps commented Feb 3, 2026 •

edited

Loading

codecov bot commented Feb 3, 2026 •

edited

Loading

manuelgloeckler Feb 5, 2026 •

edited

Loading

janfb commented Feb 6, 2026 •

edited

Loading